Comparison of Different 2D and 3D-QSAR Methods on Activity Prediction of Histamine H3 Receptor Antagonists.

Histamine H3 receptor subtype has been the target of several recent drug development programs. Quantitative structure-activity relationship (QSAR) methods are used to predict the pharmaceutically relevant properties of drug candidates whenever it is applicable. The aim of this study was to compare the predictive powers of three different QSAR techniques, namely, multiple linear regression (MLR), artificial neural network (ANN), and HASL as a 3D QSAR method, in predicting the receptor binding affinities of arylbenzofuran histamine H3 receptor antagonists. Genetic algorithm coupled partial least square as well as stepwise multiple regression methods were used to select a number of calculated molecular descriptors to be used in MLR and ANN-based QSAR studies. Using the leave-group-out cross-validation technique, the performances of the MLR and ANN methods were evaluated. The calculated values for the mean absolute percentage error (MAPE), ranging from 2.9 to 3.6, and standard deviation of error of prediction (SDEP), ranging from 0.31 to 0.36, for both MLR and ANN methods were statistically comparable, indicating that both methods perform equally well in predicting the binding affinities of the studied compounds toward the H3 receptors. On the other hand, the results from 3D-QSAR studies using HASL method were not as good as those obtained by 2D methods. It can be concluded that simple traditional approaches such as MLR method can be as reliable as those of more advanced and sophisticated methods like ANN and 3D-QSAR analyses.


Introduction
Histamine is a hydrophilic biological amine which is widely distributed throughout the animal kingdom. Almost all mammalian tissues contain histamine in varying amounts and consistent with its wide tissue distribution, it involves in many important physiological functions such as allergic responses and regulation of gastric acid secretion (1). In the peripheral and central nervous systems, it functions as a neurotransmitter (2). The wide range of physiological effects of histamine resultfrom its recognition through specific cell-surface receptors belonging to the G-protein coupled receptors superfamily. Pharmacological investigations suggest the existence of multiple receptor subtypes for histamine. Up to now, four different receptors were cloned and designated histamine H1 to H4 receptor subtypes (3)(4)(5)(6). Histamine H3 receptor identified in 1983, regulates the synthesis and release of histamine through a negative feedback mechanism (7). Histamine H3 receptors also modulate the release of several neurotransmitters such as glutamate, acetylcholine, noradrenaline, dopamine, GABA and serotonin. It has been suggested that the H3 receptor antagonists may play a role in the treatment of several neurological diseases such as epilepsy, obesity, arousal, attention-deficit hyperactivity disorder (ADHD), schizophrenia, Alzheimer's and Parkinson's diseases. Thus, finding the potent and efficacious H3 receptor antagonists have been the focus of several recent drug development programs (8)(9)(10)(11)(12).
Computer-aided drug discovery techniques have tremendous effect in shortening the process of drug discovery investigations (13,14). Among different computational techniques, the quantitative structure-activity relationship (QSAR) methods are certainly the major factors in the contemporary drug design. Thus, it is quite clear why the industrial units are the prime users of the QSAR methods (15). Different two-and three-dimensional QSAR techniques, such as methods based on multiple linear regression (MLR), principal component analysis (PCA), artificial neural networks (ANN) (16), and 3D GRID-based methods, like hypothetical active site lattice (HASL) and comparative molecular field analysis (CoMFA), are used to quantitatively predict the desired properties. ANN is a learning system based on a computational technique, which attempts to simulate the neurological processing ability of the brain (17). Recently, evolutionary methods such as genetic algorithm (GA) have received increasing attention for variable selection. The 3D-QSAR methods apply empirical force field calculations on the three-dimensionally aligned ligand structures. The alignments are guided mostly based on the exploration of crystallographically solved ligandreceptor complexes or direct superpositioning of the ligands. CoMFA and HASL techniques are among many different available 3D-QSAR methods. CoMFA uses both interactive graphics and statistical techniques to correlate the shapes and properties of molecules with their biological activity (18,19). HASL technique creates a QSAR model from a composite lattice generated from a series of regular orthogonal 3D grids established for each molecule (20,21).
In the present work, different QSAR approaches, i.e., MLR, ANN and HASL were used to model the receptor binding affinities of the 58 arylbenzofuran derived H3 receptor antagonists and then the predictive power of the methods were compared.

Experimental
Biological data Fifty-eight arylbenzofuran derivatives with histamine H3 antagonistic activities were used in QSAR analyses (Table 1). Their binding affinities to rat and human H3 receptors are shown in Tables  2 and 3 (22).

Molecular descriptors
Molecular descriptors were calculated as previously described (23). Briefly, the Hyperchem software (ver. 7.0) was used to generate 3D molecular structures and energy minimize them using MM+ force field (24). Then, the structures were fully optimized based on the semiemperical method, using AM1 level of theory (25). Hyperchem, Dragon (version 3.0) and ACDlabs suite of programs (ver. 6.00) were employed to calculate the molecular descriptors. HOMO and LUMO energies, molar refractivity, hydration energy, Log P, dipole moment, surface area and total energy were calculated using Hyperchem. From 1481 different 1D, 2D and 3D molecular descriptors calculated by Dragon software those descriptors having less than 0.95 correlation were retained for further analyses (26).
Other descriptors such as Log D at different pH values, pKa, molar volume, parachor, density, surface tension and Hansch substituent hydrophobicity constant (π) were computed using ACDlabs software.

Descriptor selection
In order to select the minimum number of molecular descriptors to be used in the modeling steps, the genetic algorithm coupled partial least square (GA-PLS) method of Riccardo Leardi was used in MATLAB environment (ver.7.0) with the following setup: population size, 30;  Table 2. Observed binding affinities a , pK i(obs) , of the substituted arylbenzofurans to the cloned human H3 receptors expressed stably in C6 cells. The pK i (pred)LGO values for MLR and ANN methods are the predicted affinities obtained in the leave-group-out cross validation study. LGO-MLR LGO-MLR pK i (pred) LGO-ANN Compound pK i (obs) a pK i (pred) LGO-MLR pK i (pred) LGO-ANN

MLR model
The procedure for MLR method was performed using SPSS (ver 11.5) program as described previously (23). Briefly, the reduced data set was subjected to stepwise regression analysis to further select a limited number of descriptors significantly contributing to the prediction of binding affinities of H3 antagonists.

ANN model
A sigmoidal transfer function and descent gradient with momentum and adaptive learning rate back propagation was designed to predict the biological activities of H3 antagonists used in this study. The back-propagation learning algorithm is the most widely used training algorithm in multilayered feed forward networks (17). All ANN calculations were carried out using MATLAB software with ANN toolbox for windows running on a Pentium 4 personal computer.
Before training process, the input and output values were normalized between 0.1 and 0.9. After simulation, the values of predicted data were transformed to the true values. The inputs and outputs for the ANN simulation were the values of the molecular descriptors selected by the MLR method and the pK i values, respectively. The number of neurons in hidden layer was varied ranging from 2 to 7, and the layer consisting of 5 neurons gave the optimum results. The training parameters used in this work were as follows: The training function was traingdm; learning rate = 0.1; momentum = 0.9; and the default values were accepted for the other parameters.

Method validation
Predictive power of the QSAR methods was assessed by leave group out cross validation technique and the q 2 values were calculated using the following equation: Here SSD is the sum of squared deviations for each actual activity value y i (pK i(obs) ) from the average activity y, over the entire data set. PRESS, the predictive sum of squares, is the sum of the squared differences between the actual activity y i and the predicted activity ỹ i (pK i(pred) ).
Absolute percentage error (APE) of predictionwas calculated for each data point and averaged using the equations 2 and 3, respectively. Here, pK i(pred) and pK i(obs) are predicted and observed binding affinities and n denotes the number of compounds. MAPE is the mean of APE values. Moreover, standard deviation of error of prediction (SDEP) was calculated to assess the distribution of error levels for rat and human data using the following equation:

3D-QSAR study
Histamine H3 antagonists were superimposed using following means. (i) Energy minimized molecules were superimposed using three atoms from arylbenzofuran substructure common to all molecules by Overlay option of Hyperchem program. (ii) Using MOE program (2007.09), one of the molecules was opened and then the second molecule was superimposed using all options set to default. In the subsequent stage, the previously opened and superimposed molecules were freezed and the third molecule was loaded and superimposed onto them. The process of freezing superimposed molecules and loading and superimposing a new molecules onto the previously opened molecules was continued until all molecules were superimposed. (iii) In a different strategy to method i and ii, we aimed to guide superpositioning of the ligands by taking into account their relative conformations after docking them into the binding site of the histamine H3 receptor molecular model developed elsewhere (23). Flexible docking of all compounds under the investigation was carried out using GOLD program (version 2.0) running on Windows XP. Then the HASL method (version 3.30) was used for the purpose of generating 3D-QSAR model using the ligands aligned according to the procedures outlined above (20, 21).The ligands randomly divided into the training and test compounds. The training set was used to generate a 3D-QSAR model in order to predict the biological activity of the test set compounds. Table 1 shows the chemical structures of 58 arylbenzofuran derivatives with H3 receptor antagonist activities used in this study. The table also contains the values for several molecular descriptors calculated for the structures. These descriptors were selected during the different steps of data reduction procedure using GA coupled PLS and MLR methods as outlined in Experimental section. The aim was to use not more than four descriptors in the models. The selected descriptors are the energy of highest occupied molecular orbital (E HOMO ), apparent distribution coefficient at pH 7.4 (LogD pH=7.4 ) and two different 3D-MoRSE descriptors (Mor 19V and Mor 30M ) for human data set and LogD pH=7.4 , 3D-MoRSE descriptor (Mor 18U ), MAXDP topological descriptor and fragment-based polar surface area (PSA) for the rat data set. Equations 5 and 6 describe the ligand binding affinities to human and rat H3 receptors respectively based on the four selected molecular parameters for each correlation.

Results and Discussion
Here, n (number of data), r 2 (squared correlation coefficient), F (f-value) and SE (standard error) are model statistics. The significance of these molecular descriptors in describing the observed binding affinities was discussed elsewhere (23).
To process the nonlinear relationships existed between the activity and the descriptors, the ANN modeling method was employed. It was generated by using the descriptors appearing in the MLR models as inputs. A 4-5-1 neural network was developed with the optimum momentum and learning rate of 0.9 and 0.1, respectively.
A leave-group-out (LGO) cross validation technique was performed to evaluate the predictive power of the MLR-and ANNbased QSAR methods used in this study.
The observed H3 receptor binding affinities of the ligands, pK i(obs) , as well as their predicted activities using the leave-group-out cross validation method, pK i(pred) , are listed in Tables 2 and 3 for human and rat data respectively. The q 2 LGO values obtained for MLR method of prediction are 0.70 and 0.79 for human and rat datasets, respectively (Table 4). Using the ANN method for prediction of the binding affinities, the q 2 LGO values are 0.65 and 0.77 for human and rat datasets, respectively.  Table  4 were also used to compare the predictive capabilities of the MLR and ANN methods.
Results from different superimposition methods on the studied arylbenzofuran H3 antagonists are depicted in Figure 1. The aligned molecules were divided into training and test sets and then the 3D-QSAR model was developed using HASL method based on the training set compounds. The activity of the test compounds were predicted using the obtained 3D-QSAR models (Tables 2 and 3) and then the absolute percentage errors of predictions were calculated (Table 4). Few rounds of model development were performed and in each round the composition of the compounds in the training and test sets were changed so that all of the compounds were given chance to be used in the test set. The results indicate that the 3D-QSAR approaches used in this study were not successful in significantly predicting the biological activity of test set compounds.
Histamine H3 receptors are autoreceptors that negatively regulate the release of histamine and other neurotransmitters such as norepinephrine, dopamine, and acetylcholine in the CNS and are believed to play a variety of physiological roles, including regulation of feeding, arousal, cognition, pain, and endocrine systems (29-31). Using the histamine H3 receptor antagonist clobenpropit, a neuroprotective role for histamine H3 receptor was also reported due to increased GABA release (32). Since the discovery of histamine H3 receptor in 1983 and cloning of its cDNA in 1999, this histamine receptor has gained the interest of many pharmaceutical companies as a potential drug target for the treatment of various important disorders, including obesity, attentiondeficit hyperactivity disorder, Alzheimer's disease, schizophrenia, as well as for myocardial ischemia, migraine and inflammatory diseases (33). Consequently, many synthetic works were conducted leading to the preclinical development of structurally diverse H3 receptor antagonists as the potential treatment tools for the above mentioned disorders (8,11,(34)(35)(36)(37)(38). However, the status of drug development based on histamine H3 receptor antagonists is far behind relative to A C B Figure 1. Alignments of arylbenzofuran derivatives generated by three different superpositioning approaches used in this study. Panel A shows the alignments obtained by flexible docking of molecules into the binding site of the structural model of histamine H3 receptor using GOLD program. Panel B and C are the results of superpositioning using HyperChem and MOE programs (see the text for further details). that for the H1 and H2 receptors antagonists as successful blockbuster rugs for treating allergic conditions and gastric ulcers, respectively (39).
The prediction of the biological activities of drug candidates is the main focus of many computer-aided drug discovery techniques. The pioneering works of generating quantitative structure-activity relationships were introduced by Hansch and coworkers in the form of MLR models. Since then many different QSAR methods were developed and used successfully in drug design and development. However, the MLR-based methods still remain one of the useful computational techniques in drug development. Here we report the QSAR studies on a set of arylbenzofuran H3 receptor antagonists using both 2D (i.e., MLR and ANN) and 3D (i.e., HASL) QSAR methods.
The purpose of QSAR studies is to select the biologically important structural descriptors and then identify the existing relations. We first used GA-PLS to reduce the number of structural features to a level manageable by MLR method. Then the MLR was used in the final feature selection step. The numbers of descriptors were kept to minimum of four in order to prevent over correlations (less than 1 descriptor per 10 compounds was selected). Equations 5 and 6 represent the MLR models generated using the four most relevant descriptors for human and rat datasets. Taking into account that the experimental procedures of obtaining the receptor affinities (pKi) for human and rat datasets are not the same and the H3 receptors for human and rat are not totally identical, the MLR models presented in equations 5 and 6 are reasonably similar. In our previous study we demonstrated the validity of the selected descriptors in modeling the H3 antagonist activities of the used compounds and the results were in agreement with the results of molecular modeling/ligand docking studies (23). The E HOMO in equation 5 may indicate presence of charge transfer interaction between the benzofuran attached phenyl group of the ligands and an aromatic residues from the receptor. In equation 6, the positive model constant for MAXDP is indicative of a positive relationship between electrophilicity of the polar moieties of the molecule and the binding affinities to the receptor, which could be related to the charge transfer capability of the molecule and be considered as a descriptor equivalent to E HOMO in equation 5. In both equations 5 and 6 the relative hydrophobicity of the compounds (Log D pH=7.4 ) is inversely related to the binding affinity. Different 3D-MoRSE descriptors, namely Mor 19V , Mor 30M and Mor 18U , were included in MLR equations 5 and 6. These descriptors are related to the 3D structures of the molecules and based on the weighting used in their calculations they are related to the volume or mass of molecules. It seems that the bigger the substituents of the molecule the higher the affinity to the H3 receptors. ANN analyses were also performed using the same set of descriptors as in the MLR method. The predictivities of MLR and ANN methods were compared using leave-groupout cross validation technique. The calculated cross-validation q 2 LGO coefficients as well as the MAPE and SDEP values for both MLR and ANN analyses are comparable as shown in Table 4. The statistical treatment of the results shows that there is no significant difference between the MAPE values obtained for human dataset using MLR and ANN methods (p-value of 0.22 for the paired twotailed t-test for the means). The same is also true for the rat dataset (p-value 0.43). There are also no statistically significant differences between the variances of the errors of the predictions obtained by MLR and ANN methods for either human or rat datasets. From the numerically small values of SDEP it can be inferred that the errors are small and their distribution is not scattered.
In order to perform 3D-QSAR analysis using HASL algorithm, first the ligands were aligned using three different approaches, as mentioned in Materials and Methods. Briefly, in the first approach, Hyperchem were applied to align energy minimized molecules by superimposing three atoms selected from arylbenzofuran moiety common to all compounds. In this method molecules were kept rigid. In the second approach, MOE program was used for flexible alignment of ligands based on all available similarity terms, such as, hydrogen bond donor and acceptor, aromaticity, hydrophobicity, and partial charges. Thirdly, we used docking approach to deduce relative conformational and geometrical position of different ligands while bound to their binding site on the model built for H3 receptor in the previous study (23). The aligned ligands and their corresponding activity values were fed into HASL program to generate QSAR model. The predictive power of the 3D-QSAR model developed using the test set compounds was very poor. The calculated MAPE and SDEP values for the test compounds of human data set were 9.39 and 1.00, respectively and for rat data set these values calculated to be 10.50 and 0.96, respectively. Low level of predictive power of 3D-QSAR analyses can be related to the shortcomings of the 3D-QSAR based on the theoretical structure that we have used for the docking-guided alignment procedure in the current study in the absence of experimentally derived structure for hH3 receptor.
However, other alignment protocols explained above also did not lead to the satisfactory results. Thus, one might relate the lack of predictivity seen in the current 3D-QSAR study to the method which has been used for the construction of 3D models (i.e., HASL). Reinvestigation of the 3D analyses using other methodologies such as CoMFA, may reveal more useful information.
In summary, the results of the current study demonstrate that the both MLR and ANN methods perform equally well in predicting the receptor binding affinities of the arylbenzofuran derived histamine H3 receptor antagonists. Although by just considering the numerical values of q 2 LGO , MAPE and SDEP it seems that MLR performs marginally well, however, this is not statistically appreciable. Both of these 2D-QSAR methods were superior to HASL, a 3D-QSAR method, in predicting the activity of the arylbenzofuran H3 antagonists. The results presented in the current comparative study indicate that the application of more sophisticated and advance methods in QSAR studies does not guarantee the best predictive outcome. In many cases, like the one presented in this work, much simpler and vastly available techniques such as MLR, can predict the property of interest (e.g., biological activity) equally well or even better than advance methods, such as ANN and 3D based approaches. (4)